Model: Minimax M2 by pwilkin · Pull Request #16831 · ggml-org/llama.cpp

pwilkin · 2025-10-28T23:11:01Z

Implementation for Minimax M2 - not doing the chat template yet because not sure how to handle the interleaving thinking blocks.

pwilkin · 2025-10-28T23:14:38Z

Closes #16798

xldistance · 2025-10-30T02:51:40Z

minmax-m2's reply is missing the think tag <think>.

ubergarm · 2025-10-30T18:40:11Z

I ran this PR with the q8_0 by DevQuasar and seems to be working. Without --jinja it got stuck in a loop but likely due to chat template stuff, but with --jinja and hitting /v1/chat/completions it seems to work okay.

It does not print an initial <think> as mentioned above, but does close the </think>. The original model card mentions this model is unique in that the client should not strip the think blocks which could lead to issues.

Full command and perplexity results (looks fine) here: https://huggingface.co/DevQuasar/MiniMaxAI.MiniMax-M2-GGUF/discussions/1

CISC

Remove the vocab files and test, if there is a good reason to test the vocab (which AFAICT there is not) we can add it to ggml-org/vocabs on HF.

convert_hf_to_gguf_update.py

ark3 · 2025-10-30T19:49:12Z

Tool calls don't work yet? Or is that just this particular GGUF (from bullerwins)?

Unknown argument ensure_ascii for function tojson at row 7, column 52:
{%- for tool in tool_list -%}
<tool>{{ tool.function | tojson(ensure_ascii=False) }}</tool>
                                                   ^
{% endfor -%}

CISC · 2025-10-30T19:56:28Z

Tool calls don't work yet? Or is that just this particular GGUF (from bullerwins)?

Unknown argument ensure_ascii for function tojson at row 7, column 52:
{%- for tool in tool_list -%}
<tool>{{ tool.function | tojson(ensure_ascii=False) }}</tool>
                                                   ^
{% endfor -%}

ensure_ascii is not yet supported by minja, see google/minja#84 but this model needs additional support for the chat template anyway, not within the scope of this PR, see OP.

pwilkin · 2025-10-30T20:48:11Z

Remove the vocab files and test, if there is a good reason to test the vocab (which AFAICT there is not) we can add it to ggml-org/vocabs on HF.

Done.

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

pwilkin · 2025-10-30T21:01:45Z

Argh, stupid codespaces.

@CISC rebased on current master, should be OK now.

convert_hf_to_gguf.py

danielhanchen · 2025-10-31T04:35:19Z

@pwilkin Fantastic work and thanks as always for your open source work!

ggerganov · 2025-10-31T08:55:51Z

not doing the chat template yet because not sure how to handle the interleaving thinking blocks.

Is it worth merging if this does not work?

CISC · 2025-10-31T09:06:13Z

not doing the chat template yet because not sure how to handle the interleaving thinking blocks.

Is it worth merging if this does not work?

I think the jinja template works if you just remove ensure_ascii, @danielhanchen and many other GGUF uploaders patch the templates to work.

danielhanchen · 2025-10-31T10:20:37Z

@CISC Yep I normally just remove it for now

CISC · 2025-10-31T10:22:32Z

@CISC Yep I normally just remove it for now

It's weird too, I don't understand why some are using it in their templates as it is default, makes no sense...

CISC

Ready to merge when CIs are done.

gguf-py/gguf/constants.py

src/llama-model.cpp

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

pwilkin · 2025-10-31T17:02:12Z

@CISC look OK to me, the failures are unrelated (webgpu).

aldehir · 2025-10-31T22:41:04Z

not sure how to handle the interleaving thinking blocks.

Seems similar to gpt-oss in this regard, except for all messages and not just tool calls.

It should work if clients pass back assistant messages with reasoning_content intact. Or am I misunderstanding?

gopinath87607 · 2025-11-01T01:31:38Z

guys any idea why the thinking tag still didn't get fixed ?

ServeurpersoCom · 2025-11-01T01:36:37Z

Once --reasoning-format none is set on the backend, everything should work as the reasoning content will be passed back to the server; the rest is purely cosmetic, like adding a lightweight, dedicated front-end filter or toggle to handle multiple <think>...</think> blocks gracefully.

We could take a more modular approach: the backend could properly parse the blocks and send alternating delta reasoning_content / delta content, while a simple front-end option could resend “reasoning_content as content” with a configurable delimiter. It would fit nicely within the OpenAI-Compat layer: though it might be a bit of overengineering... but it would cover all possible cases without needing any additional parsing logic or frontend-side hacks.

xldistance · 2025-11-01T02:00:15Z

Once --reasoning-format none is set on the backend, everything should work as the reasoning content will be passed back to the server; the rest is purely cosmetic, like adding a lightweight, dedicated front-end filter or toggle to handle multiple ... blocks gracefully.

We could take a more modular approach: the backend could properly parse the blocks and send alternating delta reasoning_content / delta content, while a simple front-end option could resend “reasoning_content as content” with a configurable delimiter. It would fit nicely within the OpenAI-Compat layer: though it might be a bit of overengineering... but it would cover all possible cases without needing any additional parsing logic or frontend-side hacks.

Adding --reasoning-format none still results in missing tink tags.

ServeurpersoCom · 2025-11-01T02:06:43Z

Adding --reasoning-format none still results in missing tink tags.

Are you running the latest version? Else please rebase and try :

The AST was broken before the recent PRs; now the Svelte UI no longer drops XML tags, whether Markdown rendering is enabled or not. If it still doesn't display, the issue lies elsewhere.
I'll give the model a try, but running the IQ2 build with only 96 GB of RAM and 32 GB of VRAM will probably feel like watching a 320p DivX on a 486

ServeurpersoCom · 2025-11-01T02:42:54Z

The Jinja template includes a generation prompt that pre-opens a <think> block ({{- ']~b]ai' ~ '\n' ~ '<think>' ~ '\n' }}), which can cause misaligned reasoning output.

A detection codepath should be added to handle or skip this pre-opened block automatically

37.2 tok/s on GPU+CPU Please open an issue; I’ll check the codepath tomorrow.

ServeurpersoCom · 2025-11-01T08:28:46Z

OK : https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/chat_template.jinja

MiniMax-M2 is the first model that actually requires this behavior (the reasoning_content must be preserved in context), so it deserves its own special option.
A new --reasoning-format minimax-m2 should behave like none, but emit an initial <think>\n chunk at the start of streaming.
It’s a pragmatic and minimalist solution: the change is lightweight, low-risk, and avoids any regression while keeping the reasoning block intact for context replay.
Only a small front-end toggle would be needed to display it properly.

gopinath87607 · 2025-11-01T10:12:23Z

i dont know why they merged this pr impo this is not good.

CISC · 2025-11-01T10:14:43Z

i dont know why they merged this pr impo this is not good.

Not everything has to (or should) be done in a single PR.

ServeurpersoCom · 2025-11-01T18:40:20Z

hksdpc255 · 2025-11-03T02:45:34Z

For anyone interested in enabling tool calls for Minimax M2, refer to PR #16932 — I’ve managed to get tool calls working.

* Model: Minimax M2 * Cleanup * Cleanup pt. 2 * Cleanup pt. 3 * Update convert_hf_to_gguf_update.py - merge catch blocks Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Remove vocab models and test * Remove all redundant hparam settings covered by TextModel * Move super to start, don't set block_count * Update src/llama-model.cpp Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> * Update gguf-py/gguf/constants.py Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com> --------- Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

pwilkin requested review from CISC and ggerganov as code owners October 28, 2025 23:11

github-actions bot added testing Everything test related python python script changes labels Oct 28, 2025

pwilkin force-pushed the minimax-m2 branch 2 times, most recently from 48aab51 to 06ed421 Compare October 28, 2025 23:13

DajanaV mentioned this pull request Oct 29, 2025

UPSTREAM PR #16831: Model: Minimax M2 auroralabs-loci/llama.cpp#9

Closed

hksdpc255 mentioned this pull request Oct 29, 2025

Feature Request: Support for the new model Minimax M2 ikawrakow/ik_llama.cpp#877

Closed

Juste-Leo2 mentioned this pull request Oct 29, 2025

Feature Request: MiniMax M2 support #16798

Closed

4 tasks

CISC reviewed Oct 30, 2025

View reviewed changes

convert_hf_to_gguf_update.py Outdated Show resolved Hide resolved

pwilkin closed this Oct 30, 2025

pwilkin force-pushed the minimax-m2 branch from 798afb0 to 16724b5 Compare October 30, 2025 20:55

pwilkin and others added 6 commits October 30, 2025 21:00

Model: Minimax M2

cbaf22b

Cleanup

552ec97

Cleanup pt. 2

6fb7ba1

Cleanup pt. 3

de9c1c3

Update convert_hf_to_gguf_update.py - merge catch blocks

98c8e6a

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Remove vocab models and test

b97e61d

pwilkin reopened this Oct 30, 2025

CISC approved these changes Oct 30, 2025

View reviewed changes

convert_hf_to_gguf.py Outdated Show resolved Hide resolved

ggerganov approved these changes Oct 31, 2025

View reviewed changes

pwilkin added 2 commits October 31, 2025 13:55

Remove all redundant hparam settings covered by TextModel

8bcd8fe

Move super to start, don't set block_count

977d57b

CISC added model Model specific hot Something that is hot and removed testing Everything test related labels Oct 31, 2025

CISC approved these changes Oct 31, 2025

View reviewed changes

gguf-py/gguf/constants.py Outdated Show resolved Hide resolved

src/llama-model.cpp Outdated Show resolved Hide resolved

pwilkin and others added 2 commits October 31, 2025 14:18

Update src/llama-model.cpp

919299f

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

Update gguf-py/gguf/constants.py

ac0d708

Co-authored-by: Sigbjørn Skjæret <sigbjorn.skjaeret@scala.com>

CISC merged commit 0de0a01 into ggml-org:master Oct 31, 2025
70 of 74 checks passed

github-actions bot mentioned this pull request Nov 1, 2025

Reddit News Daily 2025-11-01 gitlawr/reddit-daily-news#50

Open

firecoperana mentioned this pull request Nov 6, 2025

model : Port Minimax M2 from mainline ikawrakow/ik_llama.cpp#907

Merged

rick-github mentioned this pull request Nov 11, 2025

When will minimax-m2 be supported? ollama/ollama#13049

Closed

Conversation

pwilkin commented Oct 28, 2025

Uh oh!

pwilkin commented Oct 28, 2025

Uh oh!

xldistance commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ubergarm commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ark3 commented Oct 30, 2025

Uh oh!

CISC commented Oct 30, 2025

Uh oh!

pwilkin commented Oct 30, 2025

Uh oh!

pwilkin commented Oct 30, 2025

Uh oh!

Uh oh!

danielhanchen commented Oct 31, 2025

Uh oh!

ggerganov commented Oct 31, 2025

Uh oh!

CISC commented Oct 31, 2025

Uh oh!

danielhanchen commented Oct 31, 2025

Uh oh!

CISC commented Oct 31, 2025

Uh oh!

CISC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pwilkin commented Oct 31, 2025

Uh oh!

Uh oh!

aldehir commented Oct 31, 2025

Uh oh!

gopinath87607 commented Nov 1, 2025

Uh oh!

ServeurpersoCom commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xldistance commented Nov 1, 2025

Uh oh!

ServeurpersoCom commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ServeurpersoCom commented Nov 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gopinath87607 commented Nov 1, 2025

Uh oh!

CISC commented Nov 1, 2025

Uh oh!

ServeurpersoCom commented Nov 1, 2025

Uh oh!

hksdpc255 commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

xldistance commented Oct 30, 2025 •

edited

Loading

ubergarm commented Oct 30, 2025 •

edited

Loading

ServeurpersoCom commented Nov 1, 2025 •

edited

Loading

ServeurpersoCom commented Nov 1, 2025 •

edited

Loading

ServeurpersoCom commented Nov 1, 2025 •

edited

Loading

ServeurpersoCom commented Nov 1, 2025 •

edited

Loading